A dual graph neural network for drug–drug interactions prediction based on molecular structure and interactions

Expressive molecular representation plays critical roles in researching drug design, while effective methods are beneficial to learning molecular representations and solving related problems in drug discovery, especially for drug-drug interactions (DDIs) prediction. Recently, a lot of work has been put forward using graph neural networks (GNNs) to forecast DDIs and learn molecular representations. However, under the current GNNs structure, the majority of approaches learn drug molecular representation from one-dimensional string or two-dimensional molecular graph structure, while the interaction information between chemical substructure remains rarely explored, and it is neglected to identify key substructures that contribute significantly to the DDIs prediction. Therefore, we proposed a dual graph neural network named DGNN-DDI to learn drug molecular features by using molecular structure and interactions. Specifically, we first designed a directed message passing neural network with substructure attention mechanism (SA-DMPNN) to adaptively extract substructures. Second, in order to improve the final features, we separated the drug-drug interactions into pairwise interactions between each drug’s unique substructures. Then, the features are adopted to predict interaction probability of a DDI tuple. We evaluated DGNN–DDI on real-world dataset. Compared to state-of-the-art methods, the model improved DDIs prediction performance. We also conducted case study on existing drugs aiming to predict drug combinations that may be effective for the novel coronavirus disease 2019 (COVID-19). Moreover, the visual interpretation results proved that the DGNN-DDI was sensitive to the structure information of drugs and able to detect the key substructures for DDIs. These advantages demonstrated that the proposed method enhanced the performance and interpretation capability of DDI prediction modeling.


Introduction
With the rapid development of machine learning techniques, many AI technologies have been successfully applied in a variety of tasks for drug discovery, such as drug-drug interactions (DDIs) [1]. One of the main issues in these studies is how to learn expressive representation from molecular structure [2]. Most of the conventional molecular representation are based on hand-crafted features and heavily rely on time-consuming biological experimentations [3].
The most common molecular representation method called simplified molecular input line entry specification(SMILES), is the molecular linear notation that encodes the molecular topology on the basis of chemical rules [4,5], while this line of work suffered from insufficient labeled data for specific molecular tasks. More recently, among the promising deep learning architectures, graph neural networks (GNNs) have gradually emerged as a powerful candidate for modeling molecular data [6][7][8]. A molecule is naturally treated as a graph based on its geometry, where an atom serves as the node and a chemical bond serves as the edge. Therefore, a molecule is normally mapped to an undirected graph and defined as G = (V, E), where V and E are the sets of all atoms and chemical bonds in the molecule, respectively. Moreover, to better encode the interactions between atoms, a message passing neural network named MPNN was designed to utilize the attributed features of both atoms and edges [9]. It is a general framework for learning node embeddings or learning the entire graph representations. The MPNN used the basic molecular graph topology to obtain structural information through neighborhood feature aggregation and pooling methods [10,11].
DDIs prediction is one of the applications of molecular representation [12][13][14]. DDIs is referred to as a situation where the pleasant or adverse effects caused by the co-administration of two drugs, which may cause adverse drug events and side effects that damage the body [15,16]. In order to avoid such events, it's urgent to develop computational approaches to detect DDIs [17].
Various machine learning methods have been proposed and have greatly contributed to the DDIs prediction [18][19][20]. The vast majority of these techniques rely on the drug similarity assumptions, where it is believed that if drugs A and B interact to produce a specific biological effect, then drugs similar to drug A (or drug B) are likely to interact with drug B (or drug A) to produce the same effect. Drugs are, therefore, processed depending on their similarities in chemical structures; as well as additionally, in other features such as their individual side effects, targets, pathways [21,22].
Recently, many deep learning methods have been developed and have shown encouraging performance in DDIs prediction tasks [23][24][25][26]. For instance, Deng et al. [24] proposed a multi-modal deep learning framework combined diverse drug features to predict DDIs. Feng et al. [25] applied deep graph auto-encoder to learn latent drugs representations fed to a deep feedforward neural network for DDIs prediction. Liu et al. [27] introduced a deep attention neural network framework for drug-drug interaction prediction, which can effectively integrate multiple drug features. For adverse drug-drug interaction (ADDI), Zhu et al. [28] employed eight attributes and developed a discriminative learning algorithm to learn attribute representations of each adverse drug pair for exploiting their consensus and complementary information in multi-attribute ADDI prediction. And then they designed three dependence guided terms among molecular structure, side effect and ADDI to guide feature selection and put forward a discriminative feature selection model DGDFS for ADDI prediction [29]. Although these methods achieved inspiring results, there are still mostly unexplored in DDIs prediction tasks especially as far as feature extraction from the raw representations (i.e., chemical structures) of drugs are concerned. Most of the existing methods predict DDIs relying on the similarity assumption of drugs or on manually engineered features [34,35]. However, molecular structure-based methods regard drugs as independent entities, and predict DDIs only by relying on drug pairs. This is no need for external biomedical knowledge. It has been proven that DDIs usually depend only on a few substructures as a whole [36,37]. SSI-DDI [34] and GMPNN-CS [35], two recent methods that both leveraged the powerful feature extraction ability of deep learning, work directly on raw molecular chemical structures of drugs using GNNs. SSI-DDI used graph attention (GAT) layers [38] to learn a comprehensive feature representation of a drug from substructures, while GMPNN-CS introduced gated message passing neural network which learns chemical substructures with different sizes and shapes from the molecular graph representations of drugs for DDIs prediction. However, the gates are computed before the message passing, which means that they did not fully exploit the molecular structure information.
In this study, we proposed a DDIs prediction approach called DGNN-DDI that uses dual GNN to extract drug feature representation while taking into account drug substructure and the interaction information between chemical substructure. To extract the molecular substructures features, we first constructed a directed message passing neural network with substructure attention mechanism (SA-DMPNN) by fully considering the flexible-sized and irregularshaped of drug molecules substructures. Second, we used co-attention mechanism [39] to determine the importance weight by learning the interaction scores between the substructures features of the two drugs. After that, we concatenated the weighted substructures features for each drug to obtain the final feature, which was used to predict the potential interaction probability of the existing drugs and drugs. We evaluated our model using real-world dataset, and the experiments demonstrated that our DGNN-DDI is superior in predicting the potential DDIs. The method was applied to predict anti-COVID-19 drug combinations. The main contributions of this work are summarized as the following: 1. DGNN-DDI takes into account the key molecular substructure feature of the drugs, which is conducive to learning high-quality features.
2. DGNN-DDI leverages the interaction information between chemical substructure to identify substructures with interactions, which can enhance the final feature of drug and also contribute to improving the prediction accuracy of DDIs.
3. The method is applied to predict anti-COVID-19 drug combinations using real-world dataset.

Dataset
We evaluated the model performance in DrugBank [40], which is a unique bioinformatics and cheminformatics resource that combines detailed drug data with comprehensive drug target information. The dataset contains 1706 drugs and 191808 DDIs tuples classified into 86 interaction types, which describes how one drug affects the metabolism of another one. Each drug is associated with their SMILES and we converted it into molecular graph using RDKit. In the dataset, each drug pair is only associated with a single type of interaction.

Comparative analysis with other methods
On the DrugBank dataset, we compared the proposed model with cutting-edge approaches to verify its efficacy. Only chemical structural information is taken into account by these approaches as an input, and combined drug-drug information is integrated in some way during the learning process.
• SA-DDI [44]: a GNN that used a message-passing neural network and a substructure-substructure interaction module to learn thorough and useful features. SA-DDI extracted features with message passing step T = 10 for DDIs prediction.
• SSI-DDI [34]: considered each node hidden features as sub-structures and then computes interactions between these substructures to determine the final DDI prediction. SSI-DDI stacked L = 4 layers of graph attention (GAT) for DDIs prediction.
• GMPNN-CS [35]: a GNN architecture that introduced gates message passing mechanism to control the flow of message passing of GNN. GMPNN-CS learned chemical substructures with message passing step T = 10 for DDIs prediction.
• GAT-DDI [35]: replaced the GNN architecture in GMPNN-CS with GAT for drug representations, which are directly used for DDI prediction. GAT-DDI learned chemical substructures with message passing step T = 10 for DDIs prediction. Table 1 summarizes metric scores of all prediction models, and results demonstrate that DGNN-DDI outperforms other methods on all metric scores for the DrugBank dataset, which show the effectiveness of the proposed DGNN-DDI for DDI prediction.
To further analyze the performances of prediction models, we used Fig 1A-1F to display all metric scores of different methods. These violin plots clearly show that DGNN-DDI produces better performances for these metrices than the competing methods.
Moreover, we conducted a statistical analysis to test the differences between DGNN-DDI and other state-of-the-art methods. We conducted statistical significance tests by using predicted scores, and paired t-test results are demonstrated in Fig 2. For the DDI prediction, the

PLOS COMPUTATIONAL BIOLOGY
proposed method DGNN-DDI significantly (p-value < 0.05) improves the performances compared to other state-of-the-art methods.
To further demonstrate the superiority of GNN-DDI, we set T = 3 or L = 3 for all comparison methods, which is consistent with DGNN-DDI. Fig 3 displays the ROC curves (receiver operating characteristic curves) and P-R curves (precision-recall curves) of all models. Clearly, DGNN-DDI performs best among all methods, demonstrating once more its strong potential for DDIs prediction.

Parameter analysis
The parameters T and L have a significant impact on the extraction of substructures with variable sizes and forms during the processing of molecular features learning. We tested all 25 combinations of steps T and layers L, as shown in   The size of the batch is particularly significant since the DGNN-DDI is sampled and trained in batches. It will be challenging to converge if the batch size is too small. While if the batch is too large, a large amount of computation is required. As shown in Fig 5A, we investigated the

PLOS COMPUTATIONAL BIOLOGY
impact of various batch sizes on the methodology. The method has the best performance when the batch size is equal to 256. As demonstrated in Fig 5B and 5C, we also looked into how hidden dimensions and learning rates affected the performance of the model. Moreover, we conducted a significance analysis to test the differences on different values of batch size, learning rate and hidden dimension, respectively. Using predicted score, we conducted statistical significance tests, the paired t-test results are demonstrated in Fig 5D, 5E and 5F. For the three parameters, when the h i 2 R 64 , lr = 1e − 4 and batch size is equal to 256, DGNN-DDI significantly (p-value < 0.05) improves the performances compared to other values.

Ablation study
In our designs, the successful construction of the DGNN-DDI highly relies on D-MPNN with substructure attention mechanism (SA-DMPNN) and interaction-aware substructure extraction (Multi-GNN). The substruction attention is used to extract substructures with arbitrary size and shape. The relevance of substructure interactions with co-attention is expected to enhance the model performance to the final DDI prediction. We conducted experiments by removing the substructure-attention mechanism (SA) and/or co-attention layer (CA). Table 2 summarizes the contributions of SA and CA. The results show that both SA and CA are necessary for DGNN-DDI. Fig 6A-6C also shows that the model performs poorly without SA, CA, or SA and CA, showing the necessity of SA and CA. Furthermore, we observed that the results decrease

PLOS COMPUTATIONAL BIOLOGY
greatly when applying either SA or CA. However, the improvement of using both is larger than the only one, highlighting the effectiveness of DGNN-DDI. Additionally, as demonstrated in Fig 6D-6F, SA and CA can expedite training while also enhancing generalization ability.

Visual explanations for DGNN-DDI
We conducted visual explanation-related experiments to rationalize the DGNN-DDI. To investigate how the atom hidden vectors evolved during the learning process, we obtained the similarity coefficient between atom pairs by measuring the Pearson correlation coefficient for those hidden vectors. We chose the hidden vectors after the last iteration (i.e., h i in Eq 12). Figs 7 and 8 give two drugs with their atom similarity matrices during the learning process. The cluster heat maps show some degree of chaos at the beginning and then clearly group into clusters during the learning process where the corresponding substructures for clusters are highlighted in the drugs. Taking Fig 7 as an example, we found that the atoms in sildenafil at epoch 50 approximately separate into three clusters. This finding is in accordance with our intuition regarding the sildenafil structure. These results suggest that the DGNN-DDI can capture the structure information of a molecule.
Furthermore, we investigated the performances of DGNN-DDI for each DDI type and calculated the metric scores for interaction types independently by using predicted scores and real labels. The performance metrics for each DDI type are shown in Fig 9. Among 86 DDI types, DGNN-DDI achieves the highest AUC scores and the highest AUPR scores in 80 DDI

Case study
The emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in 2019 has triggered an ongoing global pandemic of the severe pneumonia-like disease coronavirus disease 2019 (COVID- 19) [30]. However, the research and development of traditional medicines for the new coronavirus are very expensive in terms of time, manpower, and funds. We hypothesized that combining drugs with independent mechanisms of action could result in synergy against SARS-CoV-2, thus generating better antiviral efficacy [45]. We prioritized 73 combinations of 32 drugs with potential activity against SARS-CoV-2 and then tested them [46]. Twelve synergistic combination drugs were identified. To further investigate which substructure among the 12 synergistic instances contributes most significantly to medication synergistic combos, we visualized the most crucial substructures for combination drugs. Specifically, we first determine the indices (h, t) of the highest pairwise interaction score from Eq 1: This can be extended to top k pairwise interaction scores for further analysis. To keep the study simple, we used only the highest one(k = 1). (h, t) tells us that the substructures of concern are from the h-th Multi-GNN layer for d x and t-th Multi-GNN layer for d y , which were primarily responsible for the DDI outcome. We chose three atoms with the largest substructures attention, which were described by Eq 13, as the center of the most vital substructures. Fig 10 show the results of this case study. Contributions of substructures are presented as a heat map (map with green fill) of the molecular graph. Each row contains two pair of drugs, for each pair of drugs, the indices h, t means the radius of a substructures. Therefore, in the heat map, each substructure contribution is shown mainly concentrated around its center. We

PLOS COMPUTATIONAL BIOLOGY
can see that the drug nitazoxanide with remdesivir, amodiaquine, emetine dihydrochloride hydrate, arbidol and NCGC00411883-01 exhibiting significant synergy against SARS-CoV-2, which is consistent with the result of a previous study [46]. When synergistic with different drugs, the key substructures of drug nitazoxanide are basically the unified, and cresyl acetate ('CC (= O)Oc1ccccc1C') or part of it can be found in all of them. However, the key substructures of drug arbidol are vary wildly. We hypothesized that this variety might be caused by various chemical substructures that function in various ways in the medication combinations used to treat SARS-CoV-2, which was consistent with the notion put out that substructures with various weights affect DDI prediction. Overall, these results highlight the utility of drug repurposing and preclinical testing of drug combinations for discovering potential therapies to treat SARS-CoV-2. Additionally, Fig 11 displays

Conclusion
This paper presented a novel molecular structure-based deep learning model DGNN-DDI for predicting DDIs between a pair of drugs. The DGNN-DDI used the substructure attention and co-attention mechanism to obtain the substructure with irregular size and shape, and enhance the representation capability for the model. On DrugBank dataset, we contrasted the suggested model with cutting-edge approaches to confirm its superiority. Moreover, we visualized the atom similarity of certain molecules. Finally, we showed the key substructures for the SARS-CoV-2 drug combinations as a case study. The visual interpretation results showed that

Materials and method
This section gives the technical details of the DGNN-DDI. First, we defined the problem that has to be resolved. Then, we presented the input representation and all involved computational steps of our method. The overall framework is shown in Fig 12.

Problem formulation
The purpose of the DDIs prediction task is to develop an advanced model that takes two drugs and an interaction type as input and generates an output indicating whether there exists an interaction between them. Formally, given a dataset of DDIs M ¼ fðd x ; d y ; rÞ i g N i¼1 , where d x , d y is taken from the drugs set D, r denotes the interaction type between two drugs, taken from the interaction types set I. Our major objective is to find a model f: D × D × I ! {0, 1}, which predicts the probability that this type of interaction will occur between the two drugs.

Input representation
The input of the model is a DDI tuple (d x , d y , r). Drugs d x and d y are both represented by SMILES strings. We preprocessed the SMILES into graph using RDKit [47] as shown in Fig  13A, where the nodes representing atoms, while edges representing the bonds between the atoms. Therefore, a drug is typically defined as a molecular graph G = (V, E), where V ¼ fv i g n i¼1 is the set of nodes, and E ¼ fðv i ; v j Þ s g m s¼1 is the set of edges. Each node v i has a corresponding feature vector x i 2 R d . Similarly, while each edge e ij = (v i , v j ) has a feature vector x ij 2 R d 0 . The features used for atoms and bonds are given in the Table 3.

Graph neural network
When a graph is represented as G = (V, E), a GNN maps a graph G to a vector h G 2 R d usually with a message passing phase and readout phase. As shown in Fig 13B and 13C, the message passing phase updates node-level features by aggregating messages from their neighbor nodes in G, and the readout phase generates a graph-level feature vector by aggregating all the nodelevel features, which is used to predict the label of the graph. Message passing phase. Given a graph G, we denoted the feature of node v at step t as x ðtÞ v 2 R d . We then updated x ðtÞ v into x ðtþ1Þ v 2 R d using the following graph convolutional layer [9]: where N(v) denotes the neighbors of v in graph G. M t and U t are the message functions and node update functions, respectively.

PLOS COMPUTATIONAL BIOLOGY
Readout phase. To obtain a graph-level feature h G , readout operation integrates all the node features among the graph G is given in Eq 4: where R is readout function, and T is the final step. So far, the GNN is learned in a standard manner, which has third shortcomings for DDIs prediction. First, the GNN extracts fixed-size substructures with a predetermined number of layers, it is insufficient to capture the global structure of the molecules. As shown in Fig 14A, a GNN with two layers is unable to know whether the ring exists in the molecule. Therefore, to capture the structures make up of k-hop neighbors, the k graph convolutional layers should be stacked. Second, a well-constructed GNN should be able to preserve the local structure of a compound. As shown in Fig 14B, the methyl carboxylate moiety is crucial for methyl

PLOS COMPUTATIONAL BIOLOGY
decanoate and the GNN should distinguish it from the less essential substituents in order to make a reasonable inference. Concretely, it is necessary to apply the attention mechanism to the key substructures. Third, DDIs usually depend only on a few substructures of the whole chemical structures. As depicted in Fig 14C, the interaction type of 'blood calcium increased' between drug pair 'Carnitine' and 'Budesonide' is caused by their partial important substructures [48]. It is feasible to break down DDIs into substructure-substructure interactions. The following, we adopted directed message passing neural network with substructure attention mechanism (SA-DMPNN) and interaction-aware substructure extraction to solve these three limitations.

Directed message passing neural network with substructure attention mechanism
The idea of substructure attention is to extract substructures with arbitrary sizes and shapes and assign each substructure a unique score. We used the D-MPNN [2] with substructure attention mechanism(SA-DMPNN) for molecule substructures extraction. The process is shown in Fig 12B. During the t-th step, the SA-DMPNN extracts substructures with a radius of t.
In the SA-DMPNN, each node will receive a message from the bond-level hidden feature. For each node v i , the hidden feature at step t is h ðtÞ i ¼ x i , we used h ðtÞ ij to represent a bond-level hidden feature with each bond e i!j . We initialized the bond-level hidden features as where W i 2 R h×d , W j 2 R h×d , and W ij 2 R h×d 0 are learnable weight matrices.

PLOS COMPUTATIONAL BIOLOGY
At step t, we computed the bond-level neighborhood features for each node before utilizing a substructure-aware global pooling, then we obtained its bond -level graph representation g (t) . The corresponding calculation equations are thus: where SAGPooling [49] can be used to calculate β ij . Given a graph with bond-level feature matrix X and adjacency matrix A in which the nonzero position indicates that two bonds share a common node, SAGPooling computes the importance vector β ij as follows: GNN is an arbitrary GNN layer for calculating projection scores. For each bond-level graph representation g (t) , the substructure attention score can be computed as follows: where � represents dot product, w (t) is a weight vector for step t. In order to make the coefficients of different steps easy to compare, we normalized e (t) by using the softmax function: where each α (t) 2 R 1 indicates the importance of the substructures with a radius of t. After updating bond-level hidden features T steps, we returned the final representation of h ij by the weighted sum of bond-level hidden features across all steps according to the following: The substructure attention mechanism will make it possible that not all the nodes in a substructure are considered equally, refining even further the type of substructures being learned.
Finally, we returned to the node-level features by aggregating the incoming bond-level features as follows: where f is a multilayer perceptron, and h i contains the substructure information from different receptive fields centering at i-th atom.

Interaction-aware substructure extraction
As mentioned above, shallow convolutional layers cannot capture global structure of the molecules, in order to overcome this limitation, we stacked multiple SA-DMPNN blocks to obtain substructure-level graph representation. The stacking structure is referred to as Multi-GNN for the sake of simplicity in descriptions. The process is shown in Fig 12A. For a given drug d x , suppose we have obtained the node-level features for each node in molecular graph G x from the SA-DMPNN. At each Multi-GNN layer l, the features of each node are denoted as h ðlÞ i , then the representation of the substructure g ðlÞ x 2 G x is therefore given by the Eq 13, which is represented by aggregating the node features h ðlÞ i , each one weighted by a learnable coefficient β i , which can be interpreted as its importance. The β i can be obtained by the SAGPooling.
After obtaining all the substructure information g ðlÞ x and g ðlÞ y of the input drugs d x and d y from all the Multi-GNN layers, we employed a co-attention mechanism to account for the importance γ ij of each pairwise interaction between the substructures of G x and G y , which is given by: where b is a learnable weight vector, W x and W y are learnable weight matrices. To prevent situations where similar substructures are given high ratings, we applied various weight matrices. Furthermore, we updated the substructural features g ðiÞ x ; g ðjÞ y with γ ij , respectively, which is formulated as follows:ĝ Finally, the graph-level representation of d x can be computed by the following: The graph-level representation of d y (i.e., g y ) can be calculated by using computational steps similar to that described in Eq 17. As opposed to the global pooling that considers every substructure equally important, we utilized the interaction information to enhance structure information of d x and d y by assigning cross-substructure interaction scores. The overall computational steps for graph-level representation of d x and d y are depicted in Fig 15.

Drug-drug interaction prediction
Given a DDI tuple (d x , d y , r), the DDIs prediction can be expressed as the join probability of the tuple: where σ is the sigmoid function, and M r is the learnable matrix representation of interaction type r. The learning process of the model can be achieved by minimizing the cross-entropy loss function [50], which is given as follows: where y i = 1 indicates that an interaction exists between d x and d y , and vice versa; and p i is the predictive interaction probability of a DDI tuple is computed by using Eq 18.